Goto

Collaborating Authors

 mental health assessment


MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

Varadarajan, Vasudha, Xu, Hui, Boehme, Rebecca Astrid, Mirstrom, Mariam Marlan, Sikstrom, Sverker, Schwartz, H. Andrew

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) offer new opportunities for scalable, interactive mental health assessment, but excessive querying by LLMs burdens users and is inefficient for real-world screening across transdiagnostic symptom profiles. We introduce MAQuA, an adaptive question-asking framework for simultaneous, multidimensional mental health screening. Combining multi-outcome modeling on language responses with item response theory (IRT) and factor analysis, MAQuA selects the questions with most informative responses across multiple dimensions at each turn to optimize diagnostic information, improving accuracy and potentially reducing response burden. Empirical results on a novel dataset reveal that MAQuA reduces the number of assessment questions required for score stabilization by 50-87% compared to random ordering (e.g., achieving stable depression scores with 71% fewer questions and eating disorder scores with 85% fewer questions). MAQuA demonstrates robust performance across both internalizing (depression, anxiety) and externalizing (substance use, eating disorder) domains, with early stopping strategies further reducing patient time and burden. These findings position MAQuA as a powerful and efficient tool for scalable, nuanced, and interactive mental health screening, advancing the integration of LLM-based agents into real-world clinical workflows.


Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives

Shakeel, Heba, Ahmad, Tanvir, Saxena, Chandni

arXiv.org Artificial Intelligence

--We introduce a dataset for classifying wellness dimensions in social media user posts, covering six key aspects: physical, emotional, social, intellectual, spiritual, and vocational. The dataset is designed to capture these dimensions in user-generated content, with a comprehensive annotation framework developed under the guidance of domain experts. This framework also includes labeling text spans from these posts to provide explanations that highlight the corresponding wellness aspects. We evaluate both traditional machine learning models and advanced transformer-based models for this multi-class classification task, with performance assessed using precision, recall, and F1-score, averaged over 10-fold cross-validation. Post-hoc explanations are applied to ensure the transparency and interpretability of model decisions. The proposed dataset contributes to region-specific wellness assessments in social media and paves the way for personalized well-being evaluations and early intervention strategies in mental health. Mental health disorders have become a critical global health problem, with more than a billion people worldwide affected by mental, neurological, and substance use disorders [1].


Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection

Kim, June-Woo, Yoon, Haram, Oh, Wonkyo, Jung, Dawoon, Yoon, Sung-Hoon, Kim, Dae-Jin, Lee, Dong-Ho, Lee, Sang-Yeol, Yang, Chan-Mo

arXiv.org Artificial Intelligence

Speech-based AI models are emerging as powerful tools for detecting depression and the presence of Post-traumatic stress disorder (PTSD), offering a non-invasive and cost-effective way to assess mental health. However, these models often struggle with gender bias, which can lead to unfair and inaccurate predictions. In this study, our study addresses this issue by introducing a domain adversarial training approach that explicitly considers gender differences in speech-based depression and PTSD detection. Specifically, we treat different genders as distinct domains and integrate this information into a pretrained speech foundation model. We then validate its effectiveness on the E-DAIC dataset to assess its impact on performance. Experimental results show that our method notably improves detection performance, increasing the F1-score by up to 13.29 percentage points compared to the baseline. This highlights the importance of addressing demographic disparities in AI-driven mental health assessment.


A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG

Kermani, Arshia, Perez-Rosas, Veronica, Metsis, Vangelis

arXiv.org Artificial Intelligence

This study presents a systematic comparison of three approaches for the analysis of mental health text using large language models (LLMs): prompt engineering, retrieval augmented generation (RAG), and fine-tuning. Using LLaMA 3, we evaluate these approaches on emotion classification and mental health condition detection tasks across two datasets. Fine-tuning achieves the highest accuracy (91% for emotion classification, 80% for mental health conditions) but requires substantial computational resources and large training sets, while prompt engineering and RAG offer more flexible deployment with moderate performance (40-68% accuracy). Our findings provide practical insights for implementing LLM-based solutions in mental health applications, highlighting the trade-offs between accuracy, computational requirements, and deployment flexibility.


Cognitive-Mental-LLM: Leveraging Reasoning in Large Language Models for Mental Health Prediction via Online Text

Patil, Avinash, Gedhu, Amardeep Kour

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated potential in predicting mental health outcomes from online text, yet traditional classification methods often lack interpretability and robustness. This study evaluates structured reasoning techniques-Chain-of-Thought (CoT), Self-Consistency (SC-CoT), and Tree-of-Thought (ToT)-to improve classification accuracy across multiple mental health datasets sourced from Reddit. We analyze reasoning-driven prompting strategies, including Zero-shot CoT and Few-shot CoT, using key performance metrics such as Balanced Accuracy, F1 score, and Sensitivity/Specificity. Our findings indicate that reasoning-enhanced techniques improve classification performance over direct prediction, particularly in complex cases. Compared to baselines such as Zero Shot non-CoT Prompting, and fine-tuned pre-trained transformers such as BERT and Mental-RoBerta, and fine-tuned Open Source LLMs such as Mental Alpaca and Mental-Flan-T5, reasoning-driven LLMs yield notable gains on datasets like Dreaddit (+0.52\% over M-LLM, +0.82\% over BERT) and SDCNL (+4.67\% over M-LLM, +2.17\% over BERT). However, performance declines in Depression Severity, and CSSRS predictions suggest dataset-specific limitations, likely due to our using a more extensive test set. Among prompting strategies, Few-shot CoT consistently outperforms others, reinforcing the effectiveness of reasoning-driven LLMs. Nonetheless, dataset variability highlights challenges in model reliability and interpretability. This study provides a comprehensive benchmark of reasoning-based LLM techniques for mental health text classification. It offers insights into their potential for scalable clinical applications while identifying key challenges for future improvements.


Artificial Intelligence and Mental Health

Communications of the ACM

One of the primary challenges faced by researchers and clinicians seeking to study mental health is that direct observation of indicators of mental health issues can be challenging, as a diagnosis often relies on either self-reporting of specific feelings or actions, or direct observation of a subject (which can be difficult due to time and cost considerations). That is why there has been a specific focus over the past two decades on deploying technology to help human clinicians identify and assess mental health issues. Between 2000 and 2019, 54 academic papers focused on the development of machine learning systems to help diagnose and address mental health issues were published, according to a 2020 article published in ACM Transactions on Computer-Human Interaction. Of the 54 papers, 40 focused on the development of a machine learning (ML) model based on specific data as their main research contribution, while seven were proposals of specific concepts, data methods, models, or systems, and three applied existing ML algorithms to better understand and assess mental health, or improve the communication of mental health providers. A few of the papers described the conduct of empirical studies of an end-to-end ML system or assessed the quality of ML predictions, while one paper specifically discusses design implications for user-centric, deployable ML systems.


AI Can Detect Signals for Mental Health Assessment

#artificialintelligence

AI can detect signals that are informative about mental health from questionnaires and brain scans. A study published today by an interdisciplinary collaboration, directed by Denis Engemann from Inria, demonstrates that machine learning from large population cohorts can yield "proxy measures" for brain-related health issues without the need for a specialist's assessment. The researchers took advantage of the UK Biobank, one of the world's largest and most comprehensive biomedical databases, that contains detailed and secure health-related data on the UK population. This work is published in the open access journal GigaScience. Mental health issues have been increasing worldwide, with the WHO determining that there has been a 13% increase in mental health conditions and substance abuse disorders between 2007 and 2017.